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Abstract — This paper considers minimum sum mean-squared 
error (sum-MSE) linear transceiver designs in multiuser down- 
link systems with imperfect channel state information. Specifi- 
cally, we derive the optimal energy allocations for training and 
data phases for such a system. Under MMSE estimation of 
uncorrelated Rayleigh block fading channels with equal average 
powers, we prove the separability of the energy allocation 
and transceiver design optimization problems. A closed-form 
optimum energy allocation is derived and applied to existing 
transceiver designs. Analysis and simulation results demonstrate 
the improvements that can be realized with the proposed design. 

I. Introduction 

Transceiver designs that minimize the sum of mean squared 
errors (sum-MSE) under a sum power constraint in the mul- 
tiuser downlink with full channel state information (CSI) at 
the base station are well researched [?], [?], [?], [?]. In these 
papers, an uplink-downlink duality is used to transform a non- 
convex downlink problem into an equivalent convex virtual 
uplink problem. Recent studies [?], [?], [?] have extended 
these original papers to the case of imperfect CSI, deriving 
an MSE duality in the presence of channel estimation errors 
and providing robust transceiver designs. 

In order to design precoders, the base station must obtain 
estimates of the channel coefficients. If channel reciprocity 
holds (i.e. the uplink and downlink channels are statistically 
identical), these estimates can be provided by training in the 
uplink (e.g., using uplink sounding, as in the WiMAX stan- 
dard [?]). However, in frequency division duplex systems (and 
in some broadband time division duplex systems [?]), channel 
reciprocity does not apply. In this case, channel estimation 
must be performed in the downlink and communicated back to 
the base station using an uplink feedback mechanism. In this 
paper, we consider imperfect CSI estimation at the mobile 
receivers, but assume that the imperfect estimates are also 
available at the base station (via an error-free and delay-free 
feedback mechanismfl 

The algorithms designed in [?], [?], [?] for minimization 
of the sum-MSE under a sum-power constraint presume that 
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'in this regard, this work complements [?], where we consider perfect 
receiver CSI estimates and a feedback mechanism incorporating prediction, 
en'or, and delay. 



fixed channel estimation error variances (t| are provided by 
a predetermined estimation mechanism. In this paper, we 
address the problem of jointly designing a training sequence 
for MMSE CSI estimation and designing linear transceivers 
for minimum sum-MSE communication. We consider the 
optimum allocation of limited available energy between the 
training and data communication phases for each quasi-static 
communication block. 

In Section [III we describe the channel model under consid- 
eration and review the design of training sequences for MMSE 
channel estimation. We then present the linear precoding 
system model and provide an overview of the design of 
minimum sum-MSE linear precoders with imperfect CSI and 
fixed transmit power In Section |III1 we formulate the joint 
design problem for energy allocation and precoder design. 
We present a closed-form solution for the optimum training 
energy, and apply the result to existing precoder design tech- 
niques. Performance and behaviour of the proposed approach 
are illustrated in Section IIVI and we draw conclusions in 
Section |V] Appendix lAl derives the MMSE channel estimation 
error variance and the calculations of our main proof are 
presented in Appendix iBl 

Notation: We use the following conventions: italics rep- 
resent scalars, lower case boldface type is used for vectors, 
and upper case boldface represents matrices, (e.g., a;,x, X, 
respectively). Entries in vectors and matrices are denoted as 
[x]^ and [X]^ ^ . The superscripts ^ and ^ denote the transpose 
and Hermitian operators. E[-] represents the statistical expec- 
tation operator while Iat is the N x N identity matrix. ||x||j^ 
and ||x||2 denote the 1-norm (sum of entries) and Euclidean 
norm, diag(x) represents the diagonal matrix formed using 
the entries in vector x, and diag [Xi, . . . , X^] is the block 
diagonal concatenation of matrices Xi, . . . , X^. The vec(X) 
operator stacks the columns of the matrix X in a single 
vector. CJV{m, R) denotes the complex multivariate Gaussian 
probability distribution with mean m and covariance matrix 
R. 

II. System Model and Background 

A. Channel Model 

In the linear precoding system illustrated in Fig. [T] a base 
station with M antennas transmits to K decentralized mobile 
users with Nk antennas each over flat wireless channels. The 
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Fig. 1. Data processing for user k in downlink and virtual uplink. 



channel between the transmitter and user k is represented by 
the Nk X M matrix H.^ , and the overall NxM channel matrix 
is H^, with H = [Hi, ... , Ha-], and where N ^J2k^k is 
the total number of receive antennas in the system. We assume 
that all channel coefficients are i.i.d. and drawn from a zero- 
mean complex Gaussian distribution with variance cr|^; that is, 
vec(H) ^ CJ\f{0, uIjImn)- We consider a quasi-static (block 
fading) channel model, where the channel coefficients are 
assumed to be fixed for a coherence interval of n consecutive 
symbol periods. The first riT transmissions in each block are 
training symbols which the mobile receivers use to estimate 
the downlink channel, these imperfect CSI estimates 

are assumed to be available at the base station via an error- 
free and delay-free feedback mechanism. We consider the 
stochastic error model (as used in [?], [?], [?]) where the 
true channel is modelled as a sum of the estimated channel 
and an independent additive error term, Hfe = + with 
vec(Efc) -CAA(0,(72iMArJ, and E= [Ei,...,Ea]. 

B. MMSE Channel Estimation and Training 

Training sequence and estimator design can be simplified 
under the assumption of uncorrected channel coefficients by 
considering training for vector channels from the M transmit 
antennas to each individual receive antenna. To simplify no- 
tation in this section, we consider training for a single vector 
channel h^. Channel estimation is performed by transmitting 
a set of riT training signal vectors, Xt = [xt,i, ■ • ■ iX^.^t]' 
from the M transmit antennas without precoding. nx > M 
training symbol vectors must be sent to guarantee resolvability 
of the individual channel coefficients. The received signal 
vector is = h-^Xr + z, where z ^ CJV{0, (jfj.nj,), and the 
MMSE channel estimate h^j^igg = ytAq is found using the 

linear MMSE estimator Ao = (^X|(Xt + ^I„t) Xf. 
Under the sum energy constraint, tr [X^Xy] < Et, where 
Et is the energy allocated to training, and the assumption 
of independent channel coefficients, a sufficient condition for 
optimality of the training matrix is X^X;^ = jj-Im [?]; 



that is, we are free to select any training matrix with or- 
thogonal rows. When using the MMSE estimator, there is no 
benefit using any more than tit — M training symbols. For 
algorith mic s implicity, we choose the set of training vectors 

Xt = \J "1^1 A/- One may also choose Xt as the scaled size- 

/ Et j27rmn/A/ 



M DFT matrix, [Xt] 



M 



which has the 



additional benefit of balancing training power equally over 
each transmit antenna in each training symbol. 

In Appendix |A] we show that the estimation errors of each 
channel coefficient are equal under the assumption of i.i.d. 
channels with variance a^, taking the value 
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As we illustrate in Section III-DI the assumption of equal 
estimation error variance is critical in maintaining convexity 
of the virtual uplink sum-MSE minimization problem. 

C. Linearly Precoded Data Communication Model 

Following training, we assume that all of the remaining 
flu = n — M symbol periods in each block will be used 
to broadcast data symbols. Under the block fading assump- 
tion, the channel H does not change during these no data 
transmissions; thus, we can design a single precoder/decoder 
pair to be used for all transmissions in the block. It follows 
that the remaining available energy to be used for data 
[En = -Emax — Et) should be divided equally over the 
no data transmissions, resulting in a maximum per-symbol 
transmit power Po = (i?max — Et)/"!^!). 

During each data transmission, user k receives Lk data sym- 



bols Xfc 



[Xfci, . . . , XkLk 



from the base station, and the vec- 



tor X = [xf , . . . , x]^] comprises independent symbols with 
unit average energy (E [xx^] = II, where L — X^aLi -^fc)- 
User fc's data streams are precoded by the M x Lk transmit 
filter Ufc = [ufci,...,UfcLj, where Ukj is the precoding 
beamformer for stream j of user k with ||ufcj||2 = 1, and 
the precoders are combined in the M x L global transmitter 
precoder matrix U = [Ui, . . . , U^-]- Power is allocated to 
user fc's data streams in the vector pfe = [pki, ■ ■ ■ ,PkLk]^ 
and Pfc = diag [pk]', we define the downlink power allocation 
matrix as P = diag [pf , . . . , p^] with tr [P] < Pp. Based 
on this model, user k receives a length-A^fc vector y^^ = 
Hl'^U-v/Px+rifc, where the superscript indicates the down- 
link, and rifc ^ CJV{0, a'^lNk)- To estimate its Lk symbols 
Xfe, user k applies the Lk x Nk receive filter V^, yielding the 
estimated symbols xf ^ Vf Hf U\/Px + Vf rifc. 

In order to design the sum-MSE minimizing precoder for the 
downlink, we use the virtual uplink, also illustrated in Fig. [T] 
where each matrix is replaced by its conjugate transpose. 
We emphasize that the virtual uplink is only a mathematical 
construct to be used for precoder design, and that its use 
does not require reciprocity of the true uplink and downlink 
channels. We imagine that transmissions from mobile user 
k in the virtual uplink propagate via the flipped channel 
Hfe to the base station. The transmit and receive filters for 



user k become and respectively, with normahzed 
precoding beamformers; i.e., ||vfcj||2 — 1, and the uphnk 
precoder matrices are gathered as a block diagonal matrix 
V = diag [Vi, . . . , Va']. Power is allocated to user fc's data 
streams as = [qki, ■ ■ . ,qkLkf , with Qfe = diag [qfc], 
Q = diag [q^, . . . , q-^] , and tr [Q] < Pjj. The received sym- 
bol vector at the base station and the estimated symbol vector 
for user k are y'^^ = HV^x+n = J^'^i H,V,^/Qlx, + n 

jH^Ur ..." 



and x^^ = IJ^y^^^, respectively, with n CAf{0, ctIIm) 



D. Robust Convex Minimum Sum-MSE Precoder Design 
The MSE matrix for user k in the virtual uphnk can be 



written as 
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-UfHfcVfe. 



/QfcVfHfU,+Ii, 



(2) 



where = VfcVQ^, R - HVQV^H^ + ^^^Im. 
Here, we have defined the effective noise power (7^^ = 
cr^ + X^feLi "^fc^^ [VfeQfcV^], under the general model with 
different estimation error variances for each user k. We 
have also assumed the independence of data symbols, noise, 
and estimation errors. The optimum robust virtual uplink 
receiver for user k is found using the MMSE (Wiener) filter 
The resulting (minimum) sum-MSB is 
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which follows from tr [AB] = tr [BA], linearity of the trace 
operator, and the definition of R. Since the beamforming 
vectors Wkj have unit norm, it follows that tr [VjQjV^] = 

Yld=i Qji = llQjIli is the sum of powers allocated to user j's 
data streams. Under a sum-power constraint with a maximum 
transmit power of Pd, the non-convex virtual uplink sum-MSE 
minimization problem can be formally defined as 

/ K \ 



(V*,Q*) =argmin 



'^cTfcllqfclli 



s.t. 



qki>0 fc = 

tr [Q] < Pd. 



k=l 

l,...,K 



tr 



/ = 1, 



R 



(4) 



When the channel estimation error variances are equal {a\ = 
(Tg), the effective noise becomes cr^g — cr,^ + '^iTlk 
Since the minimum sum-MSE is a non-increasing function of 
1 1 qfc 111, we can assume that all available power allocated 
to data transmission will be used [?]. Thus, the effective 
noise can be further simplified as o-^fj — + f^ePo for 
the optimum precoder, which is no longer a function of the 
uplink power allocations qki- The optimization problem (|4]l 
thus becomes convex (the minimization of tr R^^ under 



a sum power constraint), and can thus be solved using the 
algorithm from [?] designed for the perfect CSI case by 
substituting the effective noise a^g for the noise term cr^ in 
the original design. 

in. Joint Optimization of Energy and Precoder 
Design 

The previous section describes the design of a robust min- 
imum sum-MSE precoder for a fixed data power allocation, 
Pd. In this section, we extend this result by jointly optimizing 
the available training and data energy with the precoder design. 
As explained in Section ITl-CI the optimum strategy for sharing 
the available data energy Ed over ud transmitted symbols is 
with equal energy in each transmission. Using this strategy, 
and substituting the estimation error variance from ^ into 
the effective noise variance, we define the joint optimization 
problem 



(V*,Q*,i;*) =argmina,Vr 



R-1 
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tY[Q]=PD, Pd-- 



-Em ax — Ej 
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Theorem 1: The optimum training energy is 
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otherwise. 



(6) 



Proof: See Appendix IB] 

Corollary 1: The optimization of training/data energy allo- 
cation and the optimum precoder design in problem (|5]l are 
separable problems. This result can be seen directly in (|6]l, as 
the optimum value of Et is neither a function of V nor Q. 

Corollary 2: The sum-MSE minimizing precoder can be 
designed using existing algorithms by setting the sum power 
constraint tr [Q] < Pd = (-Emax — Et) /ud and the noise 
power term to the effective noise power ct^jj — a'^ + Pd ■ 

Corollary 3: No information can be communicated using 
the proposed algorithm in the case where E'max < ■^yMriD. 
If the total available energy fails to exceed this threshold, 
there is zero energy allocated to training; as a result, the 
estimated channel is H = and the resulting symbol estimates 
are x^^ = as well. It is difficult to provide an intuitive 
understanding of this result, as we do not have a closed-form 
expression for the minimum sum-MSE as a function of Et', 
however, we have observed in simulations that when i^max 
falls below the threshold, the resulting minimum sum-MSE is 
an increasing function of Et- It follows that the "best" (i.e., 
sum-MSE minimizing) strategy is to avoid training. 

We can reinterpret this threshold result in the context of 
average received SNR. If we define the average transmitted 
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Fig. 2. Optimum training power for varying block length n 



Fig. 3. Sum-MSE peii'ormance for equal and optimal energy allocations 



power as Pavg = ^-max/"-. we can rewrite the constraint as 



Pa. 



H 



< 



UD + M' 



(7) 



It follows that as n — > oo, a strictly positive optimum training 
power allocation is always feasible. Furthermore, the largest 
average received SNR value that the threshold can take on is 
SNRix = — 3dB, corresponding to the maximum value of the 
RHS of when no = M. 

IV. Numerical Examples 

We now present both analytical and simulation results to 
illustrate the behaviour and performance of the proposed 
algorithm. In these results, the flat Rayleigh fading channels 
are modelled with — 1. We scale the total energy Pmax 
proportionally to the block-length n to reflect a realistic 
average power constraint, Pavg — Ea\s.^ln — a; in these 
simulations, we illustrate the case of a = 1. As such, we 
define the average transmit SNR as Pavg/c^, and find different 
SNR values by varying the noise power a\. These preliminary 
results illustrate performance in a system with K = 2 users, 
M = 4 base station antennas, and Ni = N2 = Li = L2 = 2 
receive antennas and data streams per user. 

Figure |2] illustrates how the optimum power allocated to 
training, P^, grows with average SNR and with block length 
n. We observe that as n grows, the optimum power allocated 
to training becomes significantly larger than the equal power 
allocation Pp = 1; however, P^ converges fairly rapidly 
with increasing SNR. We also observe the threshold behaviour 
described in Corollary |3] 

Figures [3] and |4] illustrate the sum-MSB and average BER 
performance of the proposed algorithm. Results in each of 
these plots are generated using 5000 channel realizations 
per average SNR value, and data symbols are generated 
as uncoded QPSK. Here, we compare performance of the 
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Fig. 4. Average BER performance for equal and optimal energy allocations 



proposed algorithm to the case where equal power is allocated 
to both training and data symbols (i.e. Pt — Pd — !)■ We 
observe notable performance improvements for large block 
lengths {n ^ Af), with approximately 3 dB of SNR gain 
for n = 1000. 

V. Conclusions 

In this paper, we have considered the problem of allocat- 
ing energy to training and data symbols for systems using 
minimum sum-MSE linear precoding in the multiuser MIMO 
downlink. We have derived the optimum closed-form energy 
allocation for the case of MMSE channel estimation when 
all users have statistically identical channels. Furthermore, 
we have proven separability of the energy allocation and 



precoder designs; thus, existing algorithms for minimum sum- 
MSE precoding can be applied following energy optimization. 
Preliminary simulation results demonstrate that significant 
improvements in performance can be made for both realistic 
channel coherence intervals and transmit SNR levels. 

Appendix A 
MMSE Channel Estimation Error 

The minimum MSE matrix for the estimation of h can be 
written as 
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where we have assumed that h and z are indepen- 
dent. The fourth equality follows from application of 
the matrix inversion lemma, (A + BCD)^^ = A^^ — 
A^^B (C^i + DA^^B)"^ DA"\ Since the estimation er- 
ror liMMSE — h is a linear combination of random vectors 
from a multivariate Gaussian distribution with uncorrelated 
components, it follows that the estimation errors are also 
independent Gaussian random variables. 

Appendix B 

Optimum Training and Data Energy Allocation 

Here, we derive a closed-form expression for the optimum 
training energy that minimizes the sum-MSE precoder 
design under a sum-energy constraint, Et + Eo < E'max- 
Due to space limitations, we are only able to show the most 
common case of long blocks (with n ^ M, and consequently 
no > M); however, the identical result applies for < M. 

We perform the optimization in terms of the training power 
Pj, = Et/M. Using the virtual uplink MSE from © as 
the objective function, and the energy constraints Et > 
and Et < £^max, we derive the Karush-Kuhn-Tucker (KKT) 
conditions 

dSMSEuL 



OPt 
PtM > 0, 

A+>0, 

X+PtM = 



+ AmaxM - A , 







(9) 



An 



PtM < £;^ax (10) 

A,nax>0 (11) 

{PtM - S,,ax) = 0. (12) 



We consider only the solutions where the constraints are not 
binding, as allowing either constraint to hold with equality 
prevents us from reaching a global minimum for the opti- 
mization problem. When PtM = 0, no training symbols 



are sent, and the resulting channel estimate is = 0. If 
PtM ~ ii'max, zero energy remains for data transmission. In 
either of these cases, the resulting data symbol estimates are 
ic^^ ~ 0, and no information can be communicated. Since 
neither constraint is binding, complementary slackness (fT2l) 
requires that A^ax = A+ 0; thus, any minimizer can 
be found by considering the unconstrained minimization of 
SMSEf/i and checking feasibility of the resulting solutions. 
We begin by rewriting the effective noise power, 

„2 



^cff 



no 



Eniax — PtM 
P + Pt 



with p = cr'^/ajj. Define the derivative 
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(14) 



{p + PtY 

We then separate the data power Pd from the uplink power 
allocation by rewriting Q = PdQ, with associated sum power 



constraint tr 
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< 1. It follows that 
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Define the derivative of the trace function 
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The candidate values of Pt for unconstrained global opti- 
mality satisfy 

aSMSE[/L 



OPt 



= tr 



= i^'^tr 
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The first term 



Pntr 



dm 



R^HVQV^H^R 



noPo , 
can be rewritten as 
, which only has a trivial 



zero Pt — Ei^s.^/M (corresponding to Pd = 0) since 
the argument of the trace function is positive definite for 
non-zero power allocations Q. Any globally optimum P^ 
must therefore satisfy 



D„ 



(18) 



noPo 

Substituting the definitions of ( fTSl l and ( fT4b gives rise to the 
following quadratic equation in Pt, 



P^ino- M)+2PT{En 
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pno) = —J. p no- (19) 



The two roots of this quadratic equation are 
1 



Pt 



ud — M 



{-E^a-K - pno ± 7) : 



(20) 



with 



no p^M + 2pE„ 



max 

M 



(21) 



Clearly, for nu > M, the negative root (—7) results in an 
infeasible solution Pt < 0. We can see that the positive root 
gives rise to 
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En 
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/M 



- p^/no 



/no 



(22) 



This solution always satisfies P^M < -Bmax, and is only 
infeasible (with P^ < 0) if i?max < p^/nnM. 

Finally, we prove that this stationary point P^ is indeed 
a global minimum. We observe that the second derivative of 
SMSEj/L can be written as 

d 
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R iHVQV^H^R 1 



dPr 
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R^HVQV^H^R-i 



(23) 

but the second term vanishes at P^ due to (fTsT l. We previously 
showed that the trace term is strictly positive; thus, to prove 
that Pf is a global minimizer, we must only show that the 
remaining term in the second derivative is positive at P^: 
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At the point Pt = Py, the second term vanishes due to ( fTsT l. 
The remaining term 

^ 2al (i;;.„ax + pM) ^^^^ 
Pt=P} (P + Prf 

is positive; thus, the training power P^ is the global minimizer 
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